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Summary 


The discrete Fourier transform (DFT) is a pervasive tool in scientific and engineering calculation. 
It lies at the heart of many methods for solving partial differential equations, data analysis, statistical 
calculations, and of course Fourier Analysis (see ref 1). More than one hundred papers on fast 
algorithms have been published! So, why another? Most DFT algorithms assume use of high 
speed computer memory that is local and contiguous; others assume the data is stored in blocks of 
external - remote - memory. In contrast, the efficient DFT algorithm provided in this paper 
assumes neither case; instead, it works efficiently on computers with virtual or cache memory 
where the location of the data is mapped automatically onto a small high speed memory by the 
computer’s operating system. Important features of the algorithm are. 

• no details of page mechanism to be used 

• the code should be compact and clear 

• no auxiliary storage should be required 

• efficient execution when the data resides entirely in high speed memory. 

Additionally, a matrix theory of the DFT is given. This theory provides the basis of the “fast” 
calculation of the transforms and the proofs give the insight to efficient organization of the calculation 
under different assumptions. 

Implementing DFT’s on Virtual Memory Machines 

The DFT is most directly expressed as the product of a matrix with a vector. If (x p x 2 , . . . , x n ) 
denotes the data to be transformed, 0) n =exp(27ti/n), and 
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then a = W * x is the discrete Fourier transform of x. If a super bar is used to denote complex 
conjugate, 1 is the inverse discrete Fourier transform of x. W n is called the discrete Fourier 

operator. 
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Appendix B provides decompositions of the matrix r H , n as a product of sparse matrices. These 
decompositions provide a framework for implementing versions of fast DFTs. The results are not 
new, but the details of the proofs provide computational insight. 

DFT algorithms to evaluate (1) are based on writing 'W' as a product of sparse matrices which 
are then sequentially applied to x (ref 2, ref 3, ref 4.). The decomposition is not unique, so we may 
choose one that has properties that we find important. 

The salient feature of computers with hierarchies of memory is that transfers of data between 
different levels is by blocks rather than individual words. Therefore, more complete use of data in 
a block yields a premium in inter-memory traffic. The features that we wish to emphasize, then, 
flow from a high locality of data. One simple approach is to access data in sequentially increasing 
order and to use every datum as much as possible before a block is discarded. Unfortunately, the 
latter is difficult to achieve without detailed knowledge of block sizes, but, careful use of the 
former yields almost all of the efficiency gain. 

The following discussion assumes that the calculation overwrites the data to be transformed and 
the vector length is a power of two. Appendix B decomposes the DFT into a two phases. One 
scrambles the data; the other converts the data into its Fourier coefficients. 

In the scrambling phase, usually the binary reversal of an index is calculated directly and the 
two elements of the vector are exchanged. A number and its binary reversal usually occur in 
different parts of the vector. The binary reversal of two successive integers are in different halves 
of the vector; even integers have binary reversals in the lower half, odd integers have their binary 
reversal in the upper half. So potentially almost as many block transfers occur as half the number 
of elements in the vector. Direct application of the unscrambling matrix decomposition requires one 
sequential pass through the data for each factor. Thus, total data motion is considerably reduced. 
When the vectors are long and real memory is small, the overhead from data motion is reduced 
considerably. 

During the other phase of the DFT, the computation can be organized in several ways, one is to 
apply a factor of the decomposition to the same element in each block. This minimizes the computation 
of trigonometric function values (which are mostly calculated by recurrence relations), but does not 
address the data in a strictly sequential fashion. This organization is advantageous for machines 
with vector processing capabilities. Applying a factor of the decomposition on a complete block by 
complete block basis leads to strictly sequential access of data at the cost of repeated calculation of 
trigonometric function values. However, the decrease in overhead from data motion can often more 
than compensate this cost. This organization is also more suitable for machines with multiple 
processors than the one above. 

The algorithm given in Appendix A is based upon the factorization in equation (15b). The 
transformation phase is based upon base four transforms rather than purely base two transforms 
because there is an increase in computational efficiency without complication to the coding of the 
algorithm. 
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APPENDIX A - Fortran Code 


SUBROUTINE VMDFT (N, A, IER) 


VIRTUAL MEMORY DISCRETE FOURIER TRANSFORM I 


THIS ROUTINE EVALUATES I 

I 

N-l JK I 

X(J) = SUM (A(K)W ), J = 0 (1) N-l I 

K=0 I 

I 

WHERE W = EXP (2*PI*I/N) , I = SQRT(-l), N IS A POWER OF 2 I 


FORMAL PARAMETERS : I 

I 

N INTEGER, IN. LENGTH OF A I 

I 

A COMPLEX VECTOR OF LENGTH N. AT ENTRY, A CONTAINS THE I 

DATA TO BE TRANSFORMED. AT EXIT, IT CONTAINS THE I 

TRANSFORMED DATA. I 

I 

IER INTEGER, OUT. ERROR INDICATOR I 

IER = 1 IF NO ERROR HAS BEEN DETECTED I 

IER =2 IF N .LE. 0 ON ENTRY I 

IER = 2 IF N IS NOT A POWER OF 2 I 


CACM CATOGORY: J1 I 


KEYWORDS: FOURIER TRANSFORM, VIRTUAL MEMORY, BINARY REVERSAL I 


THE ALGORITHM IS A VARIATION OF THE COOLEY-TUKEY ALGORITHM. I 

I 

1) THE TRANSFORMATION IS DONE IN PLACE I 

2) NO AUXILIARY TABLES ARE REQUIRED I 

3) REQUIRED TRIGONOMETRIC VALUES ARE USED IN NORMAL ORDER I 

AND ARE GENERATED FROM STABILIZED RECURRENCE RELATIONS I 

4) NO BINARY REVERSE COUNTER IS REQUIRED I 

5) THE DATA IS ALWAYS ACCESSED IN MONOTONE INCREASING ORDER I 

6) THE ORGANIZATION HAS A HIGH LOCALITY OF DATA I 

7) THE MINIMUM NUMBER OF PASSES OVER THE DATA IS REQURIED I 

8) THE ALGORITHM NEEDS NO PAGE SIZE INFORMATION, BUT THERE I 

MUST BE AT LEAST 4 DATA PAGES PLUS ROOM FOR THE CODE FOR I 
IMPROVEMENT TO SHOW I 

9) THE CODE IS CLEAR AND COMPACT I 

10) THE COST IN CPU TIME OVER THE MOST EFFICIENT DFT ALGORITH I 

IS SMALL I 


USAGE : I 

I 

TO EVALUATE THE FOURIER TRANSFORM OF THE N = 2**M DATA STORED I 
IN THE COMPLEX VECTOR A: I 

I 

CALL VMDFT (N, A, IER) I 

c 
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c 

C 

C- 


c 

c 

c 


c 

c 

c 


c 


10 


20 


PARAMETER DECLARATIONS 


INTEGER N , IER 
COMPLEX A (N) 


INTERNAL DECLARATIONS 


COMPLEX AO , Al , A2 , A3, S, T 

REAL Cl, C2 , C3 , DC, DS, FOUR, HALF, ONE, QRTR, RAD 

REAL SI, S2 , S3, THRHLF, TI, TR, TWO, ZERO 

INTEGER K, KB, KK, KP, KS, KSH , KSP, KSPM, KSPP, KST 
INTEGER KO, Kl, K1MX, K2 , K2MX, K3, M, MM, MP, NL 


CONSTANTS 


DATA ZERO /0.0/, ONE / 1.0/, TWO /2.0/ 

DATA FOUR /4.0/, HALF /0.5/, QRTR /0.25/, THRHLF /1.5/ 
RAD * FOUR * ATAN (ONE) 


STAGE 0 - TEST INPUT AND INITIALIZE I 


IER = 1 
NL = N 

IF (NL .LE. 0) RETURN 

IER = 2 

»> COMPUTE LOG BASE 2 OF N 

M = 0 
MM = 1 

IF ( MM .GE. NL) GOTO 20 

MM = MM + MM 
M = M + 1 
GOTO 10 

IF ( MM .GT. NL) RETURN 

IER = 0 
MP = M / 2 
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C 

C STAGE 1 - SCRAMBLING THE DATA I 

c 

C BINARY REVERSAL BY SEQUENTIAL EXCHANGE OF SYMMETRICALLY LOCATED I 

C BITS REQURIED M/2 SEQUENTIAL PASSES OVER THE DATA AND HAS MUCH I 

C LOWER OVERHEAD FOR MACHINES WITH PAGED MEMORY THAN USING A I 

C BINARY REVERSAL COUNTER I 

c 

C THE CODE IS A VERY SLIGHTLY MODIFIED VERSION OF THE SUBROUTINE I 

C "UNSBIN" GIVEN IN I 

C POLGE, R. J., B. K. BHAGAVAN, AND J. M. CARSWELL, "FAST I 

C COMPUTATIONAL ALGORITHMS FOR BIT REVERSAL, " IEEE TRANSACTIONS I 

C ON COMPUTERS, C-23(l), PP. 1-9. I 

C 

KSP *= 1 
KSH - NL 


C »> DO (M/2) PASSES OVER THE DATA 

DO 140 KP = 1, MP 

KSPP = KSP + 1 
KSPM = KSP - 1 
KS = KSH 
KSH = KSH / 2 

KST = KSH - KSP 

KSP = KSP + KSP 


c >»STEP THROUGH THE FILE SEGMENTS 

DO 130 K0 = KSPP, NL, KS 
K1MX = K0 + KST 

C »>STEP THROUGH BLOCKS IN EACH FILE SEGMENT 

DO 120 K1 = K0, K1MX, KSP 
K2MX = K1 + KSPM 

C »>SWAP TWO BLOCKS OF DATA 

DO 110 K2 = Kl, K2MX 
K3 = K2 + KST 
S = A (K2 ) 

A (K2 ) = A (K3) 

A (K3) = S 


110 CONTINUE 

120 CONTINUE 

130 CONTINUE 

140 CONTINUE 
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c 

c 

c- 

c 

c 

c 

c 

c 

c- 

c 

c 

c 

c 

c 

c 

c 

c 


210 


c— 

C-- 

215 


C 

C 

C 

C 

c 


STAGE 2 - THE TRANSFORMATIONS 


RADIX 4+2 FAST FOURIER TRANSFORM ROUTINE USING A COOLEY-TUKEY 
LIKE ALGORITHM FOR BINARY REVERSED DATA. THE ORGANIZATION USED 
REQUIRES THE MINIMUM NUMBER OF SEQUENTIAL PASSES THROUGH THE 
DATA AND ALLOWS THE SINE-COSINE VALUES TO BE GENERATED IN 
SEQUENTIAL ORDER AS WELL. 


THE CODE IS ADAPTED FROM THE ALGOL PROCEDURE "REVFFT 4 " FROM 
CACM ALGORITHM 345, "ALGORITHM 345, AN ALGOL CONVOLUTION 
PROCEDURE BASED ON THE FAST FOURIER TRANSFORM," BY RICHARD C, 
SINGLETON. 


IF M IS ODD THEN DO A RADIX 2 TRANSFORM 


KSP = 1 

IF ( 2 * MP .EQ. M) GOTO 215 
DO 210 K0 = 1, NL, 2 
K2 = KO + 1 
AO = A (K2 ) 

A (K2 ) = A (K0 ) - AO 
A {KO ) = A ( KO ) + AO 
CONTINUE 
KSP = KSP + KSP 
RAD = HALF * RAD 


NOW DO M/2 RADIX 4 TRANSFORMS 


IF { MP .LT. 1 ) RETURN 
DO 250 KP = 1, MP 

RAD = QRTR * RAD 


NOTE: SIN-COS REFERENCES 
CAN BE REPLACED BY 
INLINE APPROXIMATIONS 


DC = TWO * SIN (RAD) ** 2 
DS = SIN (TWO * RAD) 

KST = 4 * KSP 


I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 


7 



c 


c 


c 

c 

c 


c 

c 

c 

c 

c 

c 


c 


225 


230 

240 

250 


DO 240 KB = 1, NL, KST 
Cl = ONE 
SI = ZERO 


DO 230 KK = 1, KSP 
K = KK - 1 
KO = KB + K 
K1 = KO + KSP 
K2 = K1 + KSP 
K3 = K2 + KSP 
AO ■= A (KO ) 

A2 = A (K1 ) 

A1 = A (K2 ) 

A3 = A (K3 ) 

IF ( K .EQ. 0 ) 


ADVANCE TRIG FUNCTION VALUES 


C2 - Cl - (DC * Cl + DS * SI) 
SI = SI + (DS * Cl - DC * SI) 


THE FOLLOWING 3 STATEMENTS COMPENSATE 
FOR TRUNCATION ERROR. IF ROUNDED 
ARITHMETIC IS USED, REPLACE THEM WITH 
Cl = C2 


Cl = THRHLF - HALF * (C2 * C2 + SI * SI) 

51 = Cl * SI 
Cl = Cl * C2 

C2 = Cl * Cl - SI * SI 

52 = TWO * SI * Cl 

C3 = C2 * Cl - S2 * SI 

53 = C2 * SI + S2 * Cl 

»>APPLY RADIX 4 TRANSFORM 

A2 = A2 * CMP LX (C2 , S2) 

A1 = A1 * CMPLX (Cl , SI) 

A3 = A3 * CMPLX (C3, S3) 

S = (AO + A2) 

T = (A1 + A3) 

A (KO ) = S + T 
A(K2) = S - T 
S = (AO - A2) 

T = (A1 - A3) 

TR = REAL ( T ) 

TI = AIMAG(T) 

T = CMPLX (-TI, TR) 

A (K1 ) = S + T 
A (K3) = S - T 
CONTINUE 
CONTINUE 
KSP = KST 
CONTINUE 

RETURN 

END 


BLOCK BEING TRANSFORMED 


TRANSFORM THE BLOCK 


GOTO 225 
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APPENDIX B - ALGEBRAIC THEORY 


DFT Identities 

This appendix gives the mathematical background and required identities for the DFT viewed as 
a matrix operator. We begin with some definitions. The results have been previously published, 
but the development is more complete than published results and the actual development is more 
directly useful in teasing out details for actual implementation. 

A permutation matrix is any matrix which can be obtained from an identity matrix by 
permuting its rows or columns. Let the set of numbers (c(j)}, 0<j<p-l,bea permutation of 
the integers 0 ... p - 1, and let {r(j)) be the inverse permutation: 

r(c(j)) = c(r(j)) = j 

The permutation matrix P, of order p, is defined by 

Pj k = 5(j,r(k)) = 8(c(j),k), 0 < j, k < p - 1 . 

The functions r(j) and c(j) are called the row and column functions, respectively. Thus the 
matrix has just one non-zero element (equal to unity) in each row and column. The unit element in 
row j is c(j), and the unit element in column k is in row r(k). 

The following properties of permutation matrices follow directly from the definition: 

( 1 ) The transpose is the inverse: P T = P" 1 

(2) If P, and P 2 are permutation matrices of the same order with row functions r, and r 2 and 
columns c, and c 2 , then their product P = P,P 2 is a permutation matrix with row and column 
functions given by the transitive formulas 

r (j) = r i [r 2 (j)] 

cG) = c 2 [c,(j)] 

(3) IF X is an arbitrary matrix, the effect of premultiplying by a permutation matrix is to permute 
the rows, while post multiplying permutes the columns: 

Let 

Y = PX 
Z = XP 

Then 

*,„**»„ 

Zjk = Xj r(k)* 

In particular, if 

U = XP T 
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That is, premultiplying by P produces a petmutation of the rows, postmuldplying by the transpose 

P T produces the same permutation of the columns. 

Two important permutation matrices are the square deal S p q and its transpose (and inverse 
the perfect shuffle, T q p , each of order pq, defined by the the row and column functions 

c s (j + lq) = jp + 1 = r T (j + lq) 
r s (jp + 1 )=j + 1 q = c TCip + 1 ) 
with 0<j<q-1.0<k<p-l. 


Premultiplying by S represents dealing a deck of pq cards to p playets, while premultiplying by T 
represents shuffling (merging) p decks of q cards each. The terminology is actually somewhat 
ambiguous, since the roles are reversed in postmultiplicauon. Note the identities 


Sq, p - Tp, q - Sp, q - Sp) q 
Sp, p = Tp, p 

Sp, i = Si, p = Tp, i = T p , i - Ip 

The tensor product of two matrices, A and B, denoted A®B, is the partitioned block matrix 
obtained by multiplying each element of A by the matrix B 


A®B = (A, m B) 

more specifically, if A is of order p and B is of order q, and C = A®B, then C is of order pq, with 
elements given by 


+ lq, k + mq 


The following properties follow directly from the definition . 
Associative Law: 

(A®B)®C = A®{B®C) = A®B®C 


Similarity Law: If A is of order p and B is of order q, then 



B ® A =Tp, q (A ®B)Sp,q 

(3) 

Transpose: 

(A®B) t = A t ®B t 

(4) 

Inverse: 

(A®Br ! = A -1 ®B -1 

(5) 


A®li = Ii®A = Al 

(6) 

Identity: 

Ip®Iq = Ipq 1 
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Multiplicative Law: 


(AB)®(CD) = (A®C)(B®D) 


( 7 ) 


Distributive Law: I ® nA J =n<i®A,wnA,)®.=n (Aj®I) (8) 

j j V j / j 

where the Aj are all of the same order, I is an identity matrix of arbitrary order, and denotes the 

j 

ordinary product of a finite number of factors. 

Basic Factor Theorem If P is of order p and Q is of order Q. Then 


P®Q = (PIp)® (IqQ) 

= (P®y(I p ®Q) 
= (IpP) ® (Qlq) 

= (i p ®Q)(p®y 


( 9 ) 


Thus, the tensor product can be expressed as an ordinary (commutative) product), each factor 
being a tensor product with an identity matrix. 

If P and Q are permutation matrices, then so is their tensor product A = P®Q, and its column 
function is 

c A (j + lq) = CqG) + q c p (l), 0 < j < q, 0 < 1 < p 


and its row function is 


r A (k + mq) = rq(k) + qr p (m), 0 < k < q, 0 < m < p 

since 

A - p o 

j + lq, k + mq 1, m v j, k 

The tensor product has a simple representation when one factor is a permutation matrix and the 
other is an identity matrix. Let P be a permutation matrix of order n with column function c(j), and 
let I, be the identity matrix of order s. The matrices 


Q = P®I 8 
R = I S ®P 
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are permutation matrices with column functions 


CQ(jS+l) = SC(j) + r 
c R = In + c(j) 


0< j < n 
0 < 1 < s 


So, if P is regarded as the operator that shuffles (permutes) a deck of cards, then the effect of Q is 
to perform the same permutation on a deck of "packets", each packet containing s cards that are 
stuck together. Thus, for example, the matrices 


Sp ( q ® Is 
Tp,q ® Is 


( 10 ) 


can be described as the "dirty deal" and the "sticky shuffle" respectively. On the other hand, the 
effect of R is to shuffle s decks in parallel and then stack them together. Combining these gives the 
formula for the triple product: 

Let B = I p ® Pq ® I r where P q is a permutation matrix of order q with column function c P (j)- Then 
B is a permutation matrix of order pqr with column function Cj,(j) given by 

c B (aqr + (Jr + y) = aqr + rcp(p) + y 

forO<a<p, 0<p<q, 0<y<r 


More generally, if A, B, C are arbitrary matrices of order p, q, r respectively, then the elements of 


the triple product 


D = A®B®C 


are given by 


Diqr + kr + m. jqr + Ir + n ~ Ai, jBfc, ]C m, n 
0 < i, j < p; 0 < k, 1 < q; 0 < m, n < r 


Even Order DFT 

The reduction of the Fourier matrix of even order M = 2q to a product of simpler matrices 
depends upon the simple observation 
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= "q k | 0 < j <q 

< j+1)k = «| 0 - k<2C l 


That is, the even numbered rows of contain the elements of 'W ; q , while the elements in the 
odd numbered rows are products of elements from both and This suggests permuting the 
rows of to bring the even numbered rows together in one block and the odd numbered rows 
into another. This can be done by using S iq . Define V = S iq ‘W / 2q . The elements of V are 


V .. = to * 
j* q 

V , = co ^ co 

T j + q, k q 


k 

2q 


for 0 < j, k < q. The columns can be partitioned by noting 


(O q J<k * q) = co q * 
co^^-co^ 


for 0 < j, k < q. Thus the elements of V are 




= CD* 


F j. k ♦ q 


= CD * 


V . = (D *CD-, 1 

j + qjc w 2q 


: .V J+q , k + q = -co/co 2q k 


for 0 < j, k < q. Define 


1 


x 


0 


D<*x) = 


X 2 


,D q (l)=I q ,D 1 (x) = Ii 


0 


X 1 !" 1 
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Then 


V = 


% % 
< WqDq(C02q) -St'qDq^q) 


*Wq 0 

[ Is 

0 

0 < Wq 

0 

D</0>2q) 


Iq Iq 

Iq Iq J 


which using the results from the previous section can be written compactly as 


W 2q =T 2 , q (l 2 ®lVq)A 2 .q(^®Iq) 


Where 



Is 

0 


0 

Dc/t^q) 


and W 2 = 


1 1 
1-1 


(11) 


The Fundamental Factor Theorem 

From the above, the DFT of M = 2q points can be written as a product of sparse matrices. If 
the integer q is, itself, a power of two, then the above can be iterated to yield the complete 
Cooley-Tukey algorithm. However, the process generalizes to general factors. 

Let 


^p.q - 


D<^(Opq 


0 


Dq(c02 q ) 


0 


dKt 1 ) J 
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Note that 


(12) 


V ~ Tp.q Ap, q Sp, q 


and reduces to the identity if either subscript is unity 


Ap, i — Ai_ p — Ip 


Fundamental Factor Theorem 

= T p , q (Ip 0 W q ) A p , q (W p 0 Iq) 

alternatively, 

*^tpq = ( 'Wq 0 Ip) Aq_ p (Iq 0 Up) Tp, q 


(13a) 

(13b) 


Proof: We first prove (13a). By analogy with the even order case. Start by applying the square deal 
to W 


V = s w 
p.q pq 


The elements of V are 


pq- 1 


Vj + Iq, k + mq — Sp,q^ + lq r ^ *' k + "“P 1’ ^ ~^1» 0 — 1> m < p. 


But 


so that 


but recalling that 


the identities 


+ lq, n “ ^ n ’ Cs (j + ^)] 

= 5[n, jp + 1] 


Vj + lq, k + mq — ^jp + I, k + mq 

= fnOP +| )( k + *«l) 

«Jpq 


cop, = e 2ni H 


«P q = (Oq 
COgq = C0 q 

( °pq =1 


give 
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Vj + lq, k + mq — Wq C ^ | JpqWp ,n 

Thus, V is a partitioned block matrix, with (1, m) denoting the row and column position of each 
block and (j, k) denoting the local row and column position of each element within the block V = 
(V, m ), 0 < 1, m < q. Hence, 

m = ‘7 / tqDq(o)pq)cojf n Iq 

Since W is a common factor of every block and D^to^ 1 ) is a common factor of every block in row 
1, V can be factored: 



that is, 

V=(lp® W q ) Ap.qf'W'p® Iq) 

Since, V = S p and T p is the inverse of S p q , the result follows immediately. 

(13b) follows by writing 

TVpq = T p , q (Ip ® l Wq) S p , q T p , q A p> q S p , q T p , q ('M'p ® Iq) S p , q T p , q 

= (Tp, q (^p ® ^q) Sp, q) (Tp, q Ap, q Sp, q) (Tp, q ('H'p ® Iq) S p , q ) T p , q 

and then applying (3) and (12). 

Complete Factor Theorem 

Clearly the fundamental factor theorem can be iterated if either of the integers p, q is not prime. For 
example, take p = p 2 , q = p P then 
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‘H'piP, = T P2 , p , (Ip,® %) A p , p , (^ P2 ®I P1 ) 


Now take p = p 3 , q = p 2 Pj in the fundamental theorem giving 

'Wpjpjpi = Tp,, pjp, (ip3® ^P2Pl) ^PJ. P2P1 (*^p3®^P»Pl) 

Inserting the expression given above for ‘W'pjp, and using the distributive law, 

K,m, - Tp,. BP , (Ip,®T b , p,l(lp,p,®'»'p,)('p,® A p ! . p^p,® 1 ^® 1 *) V w, («W®W) 

To continue the process, only a proper notation is necessary. To this end, consider an indefinite 
sequence of positive integers, { Pj } , each Pj > 1- Define arrays and q^ by 

r, = 1 

r j + 1 = Pj r j’ J - 1 

so that 

tj = n Pk, j ^ 2 
k = 1 

q„.n= l,n>0 

q n , J =P ) + 1 q n . j + p0^j <n 

so that 

q n ,j= ft Pk> n > 1 
k = j + 1 

Further, define 

M„ = fl Pk, n > 1 
k = 1 

so that 

r n + 1 = q«, 0 = Mn ’ n - 1 

Now note that 

q Jltl . J = Pn +1 q n -j-°^j^ n 

Pj q,.; r J = M n , 1 < j<n 

Then the procedure outlined above yields, by induction, 
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The Complete Factorization Theorem (CFT) Let 


Bmi - Bpi - I p] 

®M n + i = Tp n + 1 , r D + 1 (Ipn + i ® ®M b )» n — V 

then 

*Wk. = B Md fl (Iq,, ® A PJ , r J (I*. , ® % ®y 

j=i 

alternatively, 

j = n 

where 

Cm, C P1 I p , 

Cm.,, = (ipn * i ® Cm.) p. ♦ ,» n — V 


( 14 ) 


(15a) 


(15b) 


Notice that Bm. is a permutation matrix of order M n . Also notice that the notation is not complete 
because the matrix Bm. depends on the ordered set of integers p,, p 2 , • • •, p n > and not simply on 
their product M n . In fact, by induction, 


- n (^,.j ® ^p> *j) 

j=n 


(16) 


Since matrix multiplication is not commutative, any permutation of the order of the Pj would 
change the matrix Bm„- Also, the arrays r and q are defined with respect to the ordered sequence 
of p’s. 

Proof: First we prove (15a). First note that n = q n , n = 1, A pi> r , = I p so that the theorem is true for 
n = 1, 2, 3. Assume it is true for n. Then 

= "l^-iM. = 1^p» . , r, , i 

Setting p = p n * ,, p = r n + , = M n in the fundamental factor theorem gives 

i = ^p.»i,r.»i(lp«,i ® 'iHd.) (‘Up.,, ® lr. t i) 
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By the induction hypothesis, 14^ is given by the theorem. Hence, by the distributive law. 


Ipo , i ® - (lp» ♦ 1 ® ®M b ) II ( Ipn + i q B .j ® Ap,, r,) (Ip„ , ! q«,j ® ^p, ® IrJ 

j = l 


But p n+ iq n ,j =%+i.j • Hence > 


— r Ppn*i,rn.i(^P».i ® ®M„) PI (lq D «i.j ® Ap J ,r j )(lq n »,, J ® ® ^r,) ^p^ , i, r n . i (^p,,,, ® Ir„,i) 

J = 1 


The last two factors are simply those for j = n + 1. Therefore, the (15a) is proved. Equation (15b) 
can be proved similarly. 


Computing the DFT using the CFT 

To compute the DFT of a vector, two basic matrix-vector algorithms are needed. 
I. b = (Iq ® 'tVp ® I r )a 

n. C = (l q ® Ap r ) b 
The definitions yield: 


^kpr + lr + m 


P- 1 

^ (ojPajfpr + ar + m 
a= 0 


C-kpr + lr + m ~ COppbkp r + lr + m 


for 0 < k < q, 0 < 1 < p, 0 < m < r, and, for any (3, a» p = exp(2;ti / P). In the binary case. 


I: b a — a« + a« + r 

bp = a p - r • a P 

II: c a = b a j 

Cp = cop} bp 


for a = 2kr + m, b = (2k + l)r + m 
with 0<k<q, 0<m<r 
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Of course, there remains the the final multiplication of a vector by the permutation matrix B^. This 
requires an expression for its column function. To obtain this, we briefly digress into number 

theory. 

Theorem If j is any integer in the range 0 < j < M n then j has the unique representations 


( 1 ) 

( 2 ) 


j = X a k r k 

k = 1 
n 

j= X Mn. k 

k = 1 


with 0 < a k , P k < p k where the a and P are given by: 


( 1 ) 


Ri = 


, oti = j - piRi 


for k = 2, • • •, n, R k = 


Rk- 1 


Pk 


, a k = Rk - 1 - PkRk 


( 2 ) 


iPn 


Qi = 

for k = n, • • •, 2, Qk - 1 = 


> Pn ~ j ' PnQn 


Qk 


Pk - 1 


, Pk - 1 “ Qk " Pk - lQk - 1 


where [x] is the greatest integer less than or equal to x. 
Proof: of (1). The definition of [x] gives 


giving 

and 


0 < a, < p, 


R k— I - 1 < R k < ^- i 
Pk Pk 


gives 


0<a k < p k 
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Next, 


and, by induction, R k < q n k 
Hence, < 1 , that is, = 0. 
Thus, 


Ri< 


J_<Mn 
Pi Pi 


— Qn, 1 


X = air! + £ a k r k 

k = 1 k~2 


— j “ Pl^l + (R-k - 1 “ Pk^k) r k 
k = 2 

n 

= j-piRl+ X (Rk - l r k — Rk r k + l) 

k = 2 

= j-piRi + Rir 2 -R n r n + i 


but Tj = pj, R n = 0, so X a kfk = j. 

k = 1 

(2) is proved similarly. 

The Scrambling Matrix The column function for the scrambling matrix, B,^, 

n 

given. For 0 < j < M n , we have j = X PkQn, k- The column function is 

k= 1 

C B Jj) = X Pk r k 

k = 1 

This is easily proved by induction on n. 

For n = 1, k = 1, q j j = r } = 1 and j = pj = c(j) which is correct since B p = ^ 
matrix. 

Assume the result is true for n. By the recursive definition, 

Bm„„ = Tp^.r.Jlp,., ® B m „) 


can now be 


, the identity 
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and for 0 < j < M , use the representation j = X P^n + 1 , k- Since q n + i, k - P n + 1 V k for 

k = 1 

n 

1 < k < n, we can write j = X pkQn, kPn + l + Pn + l » or j = 1 P »+ 1 + m 


k = 1 


so that 


1 = X Pkq n . k. «i = p n + l 

k = 1 


0<KM n = r m+] ,0<m<p n+1 . 


By definition of the perfect shuffle, 


1 = X Pkq n , k. «* = Pn + 1 

k = 1 


Let D = Ip,, , ® Bm d - The column function for D is 


co(l + mM n ) = mM n + Cb„( 1) = Pn + l r n + l + C B„(1)- 


By the inductive hypothesis 

c B .( l )= X Pkfk 

k = 1 

Hence, 

n n + 1 

CB 0 . 1 U)=crfc 1 (j)] = Pn + irn+l+ X Pk r k = X Pk r k 

k=l k = 1 


Calculation of c^Jj) is easily implemented. If 

n 

j = ^ PlQn, 1 
1=1 

then 

n 

k s c(j) = X Pi r i 
1 = 1 

Define 
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then 


k„,= X Pm, 1 <m<n, k„ = 0 
1=1 


k. = k»-i + P™*™. 1 <m<n,k„ = k 

Set j = 0 


for kj := 0 to r 2 - 1 by rj do 
for k 2 := kj to - 1 by r 2 do 

for k m := km-i to r m+ i - 1 by r m do 

for k n := k n .) to M - 1 by r n do 
begin 
c(j) := k n ; 
j :=j + 1; 

end; 


Alternatively, define J n = X PiQl. n > i ^ m < n, so that Jj = j and J m = J m + l + Pm^n, n , 1 ^ nt < n. 

1 = m 

Thus, set K = 0 


for j„ := 0 to q n , „ _ i - 1 by q n , „ do 
for j n - 1 := j n to q„, n _ 2 - 1 by q n , n - 1 do 

for j m - = jm + i to q n , m - 1 — 1 by q n , m do 

for ji := j 2 to M - 1 by q n , i do 
begin 
c(ji) := k; 
k := k + 1; 

end; 
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For the binary case.p, = 2, = i~ q„. = ■ So if i = 1 Pe2- k . ® S fc * 1. 

J J * J k = 1 


then 


CBU)= X Pk^ k ' 1 > 0 - Pk - 1 

k = 1 


is the column function, which can be written 


CbO)= I P»*l-e2" k 

k = 1 

so B is symmetric and hence B~‘ = B. Thus when j is represented as an n-bit binary number, c(j) is 
obtained from j by simply reversing the order of its bits. For this reason, B is also called the 
bit-reversal matrix. In this case, a simpler notation can be used for the perfect shuffle and square 
deal matrices. Define 


T 2 --Tj, 2 -.j n21 t, =S ,=I, 
S2 n = ^>2, 2 n ] 


n 

Then with j = X p k 2 n " k - the column functions for S and T are 

k = 1 


c-r<j)= X 7k2 n k 

k = 1 

cs(j)= X a k 2 " k 

k = 1 


where Y \ - P n » Yk = ^k - 1 ♦ ^ > ^ 1 _ ^n ’ a n ^1 * °k + 1 

binary notation: j = P \ • • Pn - 1 Pn &* ves 


k < n. Thus writing j in 


ci(j)= PnPlp2--Pn-l 
Cs(j) = P2P3- • PnPl 


So Cj(j) = is obtained by shifting the bits in j one place to the right, end around, and c s (j) is 
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obtained by shifting the bits one place to the left, end around. 

We now introduce one more permutation matrix, the exchange matrix X2", defined by its 
column function c x (j): 


j=£ M n - k 

k = 1 
n 

Cx(j) = £ Xk2" ■ k , Xi = Pn, Xn = Pi, Xk = Pk, 1 < k < n 

k = 1 

Thus, the action of X is to exchange the first and last bits of j. So 


cx(2k + 1)= 2k + ^-1 

1 0<k<M 

cx(2k + M) = ( 2k + l) 4 

c x(j) = j for all other values of j 

Thus, if 


(19) 


b = X a, 

then b is obtained from a simply by moving the appropriate element ±(M/2 - 1). Note that X is 
symmetric. In other words, the odd numbered components in the first half of a are exchanged with 
the even components of the second half of a. Directly from the definitions we have 


B 2 = X 2 = T 2 = S 2 =I 

b 4 = x 4 = t 4 = s 4 
b 8 = x 8 


T 2 “ = x 2 « (i 2 ® t 2 « - ' ) = (t 2 « - »® I 2 ) x 2 - 


S 2 " = (I 2 ® S 2 "-')X2” = X2»(S2»->® I 2 ) 


B 2 n = T 2 ° (I 2 ® B 2 "- ') = (B 2 “- '® I 2 ) T 2 ” 


( 20 ) 


B 2 n = T 2 " (I 2 ® B 2 ” - >) = (B 2 n ■ '® I 2 ) T 2 ” 


= (I 2 ® B 2 b ■ ') S 2 ” = S 2 n (B 2 °- '® I 2 ) 

= (l 2 ® B2 n ~ 2 ® I 2 ) X 2 ° = X 2 " (l 2 ® b 2 ° ■ 2 ® I 2 ) 
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By induction, these yield, 


n -2 0 

T 2 * = n (i 2 )®x 2 j-")= n (x 2 -j®i2 j ) 

j=0 j=n-2 

n - 2 0 

s 2 .=n(x 2 -'®i*>= n U2 j ® x 2’-') 

j=0 j=n-2 


, n > 2 


( 21 ) 


B 2 n 


H-' 

]~[ (I 2 j ® X 2 ° • 2 > ® I 2 ') I 
j = o \ 

o 

(I 2 J ® X 2 " 2 < ® I 2 <) 

HlY' 


, n > 2 


(22) 


So scrambling, shuffling, and dealing can be done purely in terms of exchange matrices. 
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